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Abstract 

The field of property testing of probability distributions, or distribution testing, aims to 
provide fast and (most likely) correct answers to questions pertaining to specific aspects of very 
large datasets. In this work, we consider a property of particular interest, monotonicity of dis¬ 
tributions. We focus on the complexity of monotonicity testing across different models of access 
to the distributions [CFGM13, CRS12, CR14, RS09]; and obtain results in these new settings 
that differ significantly from the known bounds in the standard sampling model [BKR04] . 
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1 Introduction 


Before even the advent of data, information, records and insane amounts thereof to treat and 
analyze, probability distributions have been everywhere, and understanding their properties has 
been a fundamental problem in Statistics. 1 Whether it be about the chances of winning a (possibly 
rigged) game in a casino, or about predicting the outcome of the next election; or for social studies 
or experiments, or even for the detection of suspicious activity in networks, hypothesis testing and 
density estimation have had a role to play. And among these distributions, monotone ones have 
often been of paramount importance: is the probability of getting a cancer decreasing with the 
distance from, say, one’s microwave? Are aging voters more likely to vote for a specific party? Is 
the success rate in national exams correlated with the amount of money spent by the parents in 
tutoring? 

All these examples, however disparate they may seem, share one unifying aspect: data may 
be viewed as the probability distributions it defines and originates from; and understanding the 
properties of this data calls for testing these distributions. In particular, our focus here will be 
on testing whether the data - its underlying distribution - happens to be monotone , 2 or on the 
contrary far from being so. 

Since the seminal work of Batu, Kumar, and Rubinfeld [BKR04], this fundamental property 
has been well-understood in the usual model of access to the data, which only assumes independent 
samples. However, a recent trend in distribution testing has been concerned with introducing 
and studying new models which provide additional flexibility in observing the data. In these new 
settings, our understanding of what is possible and what remains difficult is still in its infancy; and 
this is in particular true for monotonicity, for which very little is known. This work intends to 
mitigate this state of affairs. 

We hereafter assume the reader’s familiarity with the broad held of property testing, and the 
more specific setting of distribution testing. For detailed surveys of the former, she or he is re¬ 
ferred to, for instance, [FisOl, Ron08, RonlO, GollO]; an overview of the latter can be found 
e.g. in [Rubl2], or [Canl5]. Details of the models we consider (besides the usual sampling ora¬ 
cle setting, denoted by SAMP) are described in [CFGM13, CRS12, CRS14] (for the conditional 
sampling oracle COND, and its variants INTCOND and PAIRCOND restricted respectively to in¬ 
terval and pairwise queries); [BKR04, GMV06, CR14] for the Dual and Cumulative Dual models; 
and [RS09] for the evaluation-only oracle, EVAL. The reader confused by the myriad of notations 
featured in the previous sentence may find the relevant definitions in Section 2 and Appendix A 
(as well as in the aforementioned papers). 

Caveat. It is worth mentioning that we do not consider here monotonicity in its full general 
setting: we shall only focus on distributions defined on the line. In particular, the (many) works 
concerned with distributions over high-dimensional posets are out of the scope of this paper. 

1.1 Results 

In this paper, we provide both upper and lower bounds for the problem of testing monotonicity, 
across various types of access to the unknown distribution. A summary of results, including the 

x As well as - crucially - in crab population analysis [Pea94], 

2 Recall that a distribution D on {1,... ,n} is said to be monotone (non-increasing) if D( 1) > • • • > D(n), i.e. if 
its probability mass function is non-increasing. We hereafter denote by At the class of monotone distributions. 


1 



best currently known bounds on monotonicity testing of distributions, can be found in Table 1 
below. As noted in Section 3, many of the lower bounds are implied by the corresponding lower 
bound on testing uniformity. 


Model 

Upper bound 

Lower bound 

SAMP 

o(#) 

«(#) 

COND 


Q (^) 

INTCOND 

r)f log ! nN l 

nf / log” ) 

u \~^ ) 

\V log log n ) 



/, \ / , X 

EVAL 

o( 


Cumulative Dual 


n (D 


Table 1: Summary of results for monotonicity testing. The highlighted ones are new; bounds with 
an asterisk* hold for non-adaptive testers. 

1.2 Techniques 

Two main ideas are followed in obtaining our upper bounds: the first one, illustrated in Section 3 
and Section 4.1, is the approach of Batu et al. [BKR04], which reduces monotonicity testing to 
uniformity testing on polylogarithmically many intervals. This relies on a structural result for 
monotone distributions which asserts that they admit a succinct partition in intervals, such that on 
each interval the distribution is either close to uniform (in i 2 distance), or puts very little weight. 

The second approach, on which Section 4.2, Section 5.1 and Section 6 are based, also leverages a 
structural result, due this time to Birge [Bir87]. As before, this theorem states that each monotone 
distribution admits a succinct “flat approximation,” but in this case the partition does not depend 
on the distribution itself (see Section 2 for a more rigorous exposition). From there, the high-level 
idea is to perform two different checks: first, that the distribution D is close to its “flattening” D ; 
and then that this flattening itself is close to monotone - where to be efficient the latter exploits 
the fact that the effective support of D is very small, as there are only polylogarithmically many 
intervals in the partition. If both tests succeed, then it must be the case that D is close to monotone. 

1.3 Organization 

The upper bounds for the conditional models, Theorem 4.1 and Theorem 4.2, are given in Section 4. 
Section 5 contains details of the results in the evaluation query model: the upper bound of Theorem 5.4, 
and the non-adaptive and adaptive lower bounds of respectively Theorem 5.5 and Theorem 5.7. Fi¬ 
nally, in Section 6 we prove Theorem 6.1, our upper bound for the Cumulative Dual access model. 

We also note that, in the course of obtaining one of our upper bounds, we derive a result previ¬ 
ously (to the best of our knowledge) absent from the literature: namely, that learning monotone 
distributions in the EVAL-only model can be accomplished using 0((logn)/e) queries (Lemma 5.2). 

Finally, we show in Appendix B that some of our techniques extend to tolerant testing, and 
describe in two of the models tolerant testers for monotonicity whose query complexity is only 
logarithmic in n (Theorem B.l and Corollary B.3). 
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2 Preliminaries 


All throughout this paper, we denote by [n] the set {1,..., n}, and by log the logarithm in base 2. 
A probability distribution over a (finite) domain 3 is a non-negative function D : Q —>• [0,1] such 
that D(x) = 1. We denote by U{ fl) the uniform distribution on Q. Given a distribution D 

over 12 and a set £ C L2, we write D(S) for the total probability weight YlxeS D(x) assigned to S 
by D. Finally, for S C such that D(S ) > 0, we denote by Dg the conditional distribution of D 
restricted to S, that is Dg(x ) = for x € S and Dg(x ) = 0 otherwise. 

As is usual in property testing of distributions, in this work the distance between two distribu¬ 
tions D\, D 2 on Q will be the total variation distance: 

d T v{D 1 ,D 2 ) ^ i|| D 1 - £> 2 Hi = \ £ l^i W - D 2 {i)\ = max(A(5) - £> 2 (5)) (1) 

which takes value in [0,1]. 

Models and access to the distributions. We now describe (informally) the settings we shall 
work in, which define the type of access the testing algorithms are granted to the input distribution. 
(For a formal definition of these models, the reader is referred to Appendix A.) In the first and 
most common setting (SAMP), the testers access the unknown distribution by getting independent 
and identically distributed samples from it. 

A natural extension, COND, allows the algorithm to provide a query set S C [n], and get a 
sample from the conditional distribution induced by D on S: that is, the distribution Dg on S 
defined by Dg(i) = D(i)/D(S). By restricting the type of allowed query sets to the class of intervals 
{a,..., b} C [n], one gets a weaker version of this model, INTCOND (for “interval-cond”). 

Of a different flavor, providing (only) evaluation queries to the probability mass function (pmf) 
(resp. to the cumulative distribution function (cdf)) of the distribution an EVAL (resp. CEVAL) 
oracle access. When the algorithm is provided with both SAMP and EVAL (resp. SAMP and 
CEVAL) oracles to the distribution, we say it has Dual (resp. Cumulative Dual) access to it. 

On the domain and parameters. Unless specified otherwise, 0 will hereafter by default be the 
n-element set [re]. When stating the results, the accuracy parameter e E [0,1] is to be understood 
as taking small values, either a tiny constant or a quantity arbitrarily close to 0; however, the actual 
parameter of interest will always be re, viewed as “going to infinity.” Hence any dependence on re, 
no matter how mild, shall be considered as more expensive than any function of e only. 

On monotone distributions. We now state here a few crucial facts about monotone distribu¬ 
tions, namely that they admit a succinct approximation, itself monotone, close in total variation 
distance: 


3 For the focus of this work, all distributions will be supported on a finite domain; thus, we do not consider the 
fully general definitions from measure theory. 
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Definition 2.1 (Oblivious decomposition). Given a parameter e > 0, the corresponding oblivious 
decomposition of [n] is the partition Z £ = (Ji,...,/^), where i = = and 4 


4 


(1 + e) fc J, 1 <k<i. 


For a distribution D and parameter e, define 4> £ (D) to be the flattened distribution with relation 
to the oblivious decomposition Z £ : 


Vk € [4 Vi G I k , ®e{D){i) 


D(Ik) 

141 


( 2 ) 


Note that while 4> e (.D) (obviously) depends on D, the partition Z £ itself does not ; in particular, it 
can be computed prior to getting any sample or information about D. 


Theorem 2.2 ([Bir87]). If D is monotone non-increasing, then dTv(D, 4> £ (Z5)) < e. 


Remark 2.3. The first use of this result in this discrete learning setting is due to Daskalakis et 
al. [DDS12], For a proof for discrete distributions (whereas the original paper by Birge is intended 
for continuous ones), the reader is referred to [DDS + 13] (Section 3.1, Theorem 5). 

Corollary 2.4 (Robustness). Suppose D is e-close to monotone non-increasing. Then dTv(D, <h Q (.D)) < 
2 e + a; furthermore, & a (D) is also e-close to monotone non-increasing. 

Proof. Let P be a monotone non-increasing distribution such that dx \{D,P) < e. By the triangle 
inequality, 


d T v(D,$ Q (D)) < dTv(D,P)+dTv(P,* a (P)) + dTv(* a (P),* a (D)) < e + a + d T v(4> a (P), $ a (D)) 


where the last inequality uses the assumption on P and Theorem 2.2 applied to it. It only remains 
to bound the last term: by definition, 


2dTv(<M P),*a(D)) = E \$ a (Dm - *a(P)(i)\ = E E I*«(£>)« - *a(Pm 

i= 1 fc=l ieR 

£ D(I k ) - P(4) e 


= EE 

k=1 iG/fc 

e 

= E 


141 


= E l- D (4) - -P(4)l 


k =1 

< 2e 


E (D(i) - P(i)) 


k =1 

l 


<EE \D(i)-P(i)\=2d TY (P, D ) 

k=l iE/fc 


(showing in particular the second part of the claim, as <I? a (P) is monotone) and thus 

dTv(D, 4 > q (Z1)) < 2e + a 


as claimed. □ 

4 We will often ignore the floors in the definition of the oblivious partition, to avoid more cumbersome analyses 
and the technicalities that would otherwise arise. However, note that this does not affect the correctness of the 
proofs: after the first (5(1) intervals (which will be, as per the above definition, of constant size), we do have indeed 
|/ fc+ i| £[1 + |,1-|- 2e] \Ik\. This multiplicative property, in turn, is the key aspect we shall rely on. 


4 











One can interpret this corollary as saying that the Birge decomposition provides a tradeoff 
between becoming simpler (and at least as close to monotone) while not staying too far from the 
original distribution. 

Incidentally, the last step of the proof above implies the following easy fact: 

Fact 2.5. For all a E (0,1], 


d TV ($a(P), *a{D)) < d T y (P, D) (3) 

and in particular, for any property V preserved by the Birge transformation (such as monotonicity) 

d T v($a(D),V)<d TV {D,V). (4) 


Other tools. Finally, we will use as subroutines the following results of Canonne, Ron, and 
Servedio. The first one, restated below, provides a way to “compare” the probability weight of 
disjoint subsets of elements in the COND model: 

Lemma 2.6 ([CRS12, Lemma 2]). Given as input two disjoint subsets of points 1,7 Cfi together 
with parameters p E (0, 1], K > 1, and 5 E (0, 1/2], as well as COND query access to a distribution 
D on 0, there exists a procedure Compare that either outputs a value p > 0 or outputs High or 
Low, and satisfies the following: 

(i) If D(X)/K < D(Y ) < K ■ D(X) then with probability at least 1 — 5 the procedure outputs a 
value p E [1 — 77 ,1 + rf\D(Y) / D(X); 

(ii) If D(Y) > K ■ D(X) then with probability at least 1 — 6 the procedure outputs either High or 
a value p E [1 — 77 , 1 + rj\D(Y)/D(X); 

(Hi) If D(Y) < D(X)/K then with probability at least 1 — 5 the procedure outputs either Low or a 
value p E [1 — 77 ,1 + rj\D(Y)/D(X). 

The procedure performs o(^ K ) COND queries on the set X U Y . 

The second allows one to estimate the distance between the uniform distribution and an un¬ 
known distribution D, given access to a conditional oracle to the latter: 


Theorem 2.7 ([CRS12, Theorem 14]). Given as input e E (0,1] and 5 E (0,1], as well as PAIR- 
COND query access to a distribution D on Hi, there exists an algorithm that outputs a value d and 
has the following guarantee. The algorithm performs 0(l/e 20 log(l/<5)) queries and, with probability 
at least 1 — 5, the value it outputs satisfies d — dx y(D,U) < e. 


3 Previous work: Standard model 

In this section, we describe the currently known results for monotonicity testing in the standard 
(sampling) oracle model. These bounds on the sample complexity, tight up to logarithmic factors, 
are due to Batu et al. [BKR04]; 5 while not directly applicable to the other access models we will 
consider, we note that some of the techniques they use will be of interest to us in Section 4.1. 

5 [BKR04] originally claims an 0{\/n/ e 4 ) sample complexity, but their argument seems to only result in an 

0(v7i/e 6 ) bound. Subsequent work building on their techniques [CDGR15] obtains the e 4 dependence. 
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Theorem 3.1 ([BKR04, Theorem 10]). There exists an O^^-polylognJ-query tester for mono- 
tonicity in the SAMP model. 

Proof (sketch). Their algorithm works by taking this many samples from D , and then using them 
to recursively split the domain [n] in half, as long as the conditional distribution on the current 
interval is not close enough to uniform (or not enough samples fall into it). If the binary tree 
created during this recursive process exceeds O (log 2 n/e) nodes, the tester rejects. Batu et al. 
then show that this succeeds with high probability, the leaves of the recursion yielding a partition 
of [n] in f max = 0^1og 2 n/e) intervals I \,...,/^ max , such that either 

(a) the conditional distribution Dj j is 0(e)-close to uniform on this interval; or 

(b) Ij is “light,” i.e. has weight at most 0(e/^ max ) under D. 

(the first item relying on a lemma from [BFR + 00] relating distance to uniformity and collision 
count 6 7 ). This implies this partition defines an f max -flat distribution D which is e/2-close to D , 
and can be easily learnt from another batch of samples; once this is done, it only remains to test 
(e.g., via linear programming, which can be done efficiently) whether this D is itself e/2-close to 
monotone, and accept if and only this is the case. □ 

Theorem 3.2 ([BKR04, Theorem 11]). Any tester for monotonicity in the SAMP model must 
perform If queries. 

Proof (sketch). To prove this lower bound, they reduce the problem of uniformity testing to mono¬ 
tonicity testing: from a distribution D over \n\ (where n is for the sake of simplicity assumed to 

be even), one can run a monotonicity tester (with parameter e' c = e/3) on both D and Cl, where 
the latter is defined as Cl (i) = f D(n — i), i € [n]; and accept if and only if both tests pass. If D 
is uniform, clearly D = Cl is monotone; conversely, one can show that if both D and its “mirrored 
version” d pass the test (are enclose to monotone non-increasing), then it must be the case that D 
is e-close to uniform. The lower bound then follows from the lower bound of [Pan08] for 

testing uniformity. □ 

We note that the argument above extends to all models: that is, any lower bound for testing 
uniformity directly implies a corresponding lower bound for monotonicity in the same access model 
(giving the bounds in Table 1). 

Open question. At a (very) high-level, the above results can be interpreted as “relating mono¬ 
tonicity to uniformity.” That is, the upper bound is essentially established by proving that mono¬ 
tonicity reduces to testing uniformity on polylogarithmically many intervals, while the lower bound 
follows from showing that it reduces from testing uniformity on a constant number of them. Thus, 
an interesting question is whether, qualitatively, the former or the latter is tight in terms of n. Are 
uniformity and monotonicity strictly as hard, or is there an intrinsic gap, even if only polylogarith- 
mic, between the two? 

6 We observe that the dependence on e could be brought down to e 4 , by using instead machinery from [DKN15, 
Theorem 11] to perform this step. 

7 [BKR04] actually only shows a ^(y^n) lower bound, as they invoke in the last step the (previously best known) 
lower bound of [GR00] for uniformity testing; however, their argument straightforwardly extends to the result of 
Paninski. 
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Question 3.3. Can monotonicity be tested in the SAMP model with 0(y/n) samples, or are hl(y / nlog c n) 
needed for some absolute constant c > 0? 


4 With conditional samples 

In this section, we focus on testing monotonicity with a stronger type of access to the underlying 
distribution, that is given the ability to ask conditional queries. More precisely, we prove the 
following theorem: 

Theorem 4.1. There exists an O^-pj^j-query tester for monotonicity in the COND model. 

Furthermore, assuming only a (restricted) type of conditional queries are allowed, one can still get 
an exponential improvement from the standard sampling model: 

Theorem 4.2. There exists an o( ^°^i n ^j-query tester for monotonicity in the INTCOND model. 

We now prove these two theorems, starting with Theorem 4.2. In doing so, we will also derive a 
weaker, poly(logn, l/e)-query tester for COND; before turning in Section 4.2 to the constant-query 
tester of Theorem 4.1. 


4.1 A poly(logn, l/e)-query tester for INTCOND 

Our algorithm (Algorithm 1) follows the same overall idea as the one from [BKR04], which a major 
difference. As in theirs, the first step will be to partition [n] into a small number of intervals, such 
that the conditional distribution Dj on each interval I is close to uniform; that is, 


dT y{Di,Ui) 


E 

iei 


D(i) 


D(I) 



(5) 


The original approach (in the sampling model) of Batu et al. was based on estimating the £2 norm 
of the conditional distribution via the number of collisions from a sufficiently large sample; this 
yielded a 0{y/n) sample complexity. 

However, using directly as a subroutine (in the COND model) an algorithm for (tolerantly) 
testing uniformity, one can perform this first step with £ max log 4 = ^ max log h max calls 8 to this 
subroutine, each with approximation parameter | (the proof of correctness from [BKR04] does not 
depend on how the test of uniformity is actually performed, in the partitioning step). 

A first idea would be to use for this the following result: 

Fact 4.3 ([CRS14]). One can test e-uniformity of a distribution D r over [r] in the conditional 
sampling model: 

• with 0( 1/e 2 ) samples, given access to a COND^ oracle; 

• with 0(^log 3 r/£ 3 ^ samples, given access to a INTCOND/y. oracle. 

8 Where the logarithmic dependence on 5 aims at boosting the (constant) success probability of the uniformity 
testing algorithm, in order to apply a union bound over the O(Cnax) calls. 


7 







However, this does not suffice for our purpose: indeed, Algorithm 1 needs in Step 6 not only to 
reject distributions that are too far from uniform, but also to accept those that are close enough. 
A standard uniformity tester as the one above does not ensure the latter condition: for this, one 
would a priori need tolerant tester for uniformity. While [CRS14] does describe such a tolerant 
tester (see Theorem 2.7), it only applies to COND - and we aim at getting an INTCOND tester. 

To resolve this issue, we observe that what the algorithm requires is slightly weaker: namely, to 
distinguish distributions on an interval / that (a) are H(e)-far from uniform from those that are (b) 
0(s/ |I|)-close to uniform in distance. It is not hard to see that the two testers of Fact 4.3 can 
be adapted in a straightforward fashion to meet this guarantee, with the same query complexity. 
Indeed, (b) is equivalent to asking that the ratio D(x)/D(y) of any two points in / be in [1 —e, 1 + e], 
which is exactly what both testers check. 


Algorithm 1 General algorithm TestMonCond 0 
Require: O G {COND, INTCOND} access to D 
1, Define W = O(^). S * 0( | i- 

2 : Draw m = f O log samples hi,...,h m . 

3: PartitionStart 

4: Start with interval I [n] 

5: repeat 

6 : Test (with probability > 1 — 5) if Dj is e/4-close to the uniform distribution on / 

7: if dx y(Di,Ui) > | then 

8 : bisect / in half 

9: recursively test each half that contains at least one of the hi s, mark them as “light” 

otherwise 

10: else if f max splits have been made then 

11: return FAIL 

12: end if 

13: until all intervals are close to uniform or have 

14: PartitionEnd 

15: Let Xu = (Ii,...,I() denote the partition of [n] 
recursion from the previous step. 

16: Obtain an additional sample T of size O ^ lo f\ ^ . 

17: Let D denote the Aflat distribution described by 
from T falling in Ij. 

18: if D is (e/2)-close to monotone then > Can be 
19: return ACCEPT 

20: else 

21: return FAIL 

22: end if 


been marked “light” 

into intervals induced by the leaves of the 

(w, Xp) where ojj is the fraction of samples 
tested in poly(f)-time ([BKR04, Lemma 8]) 


As a corollary, we get: 

Corollary 4.4. Given access to a conditional oracle O for a distribution D over [n], the algorithm 
TestMonCond 0 outputs ACCEPT when D is monotone and FAIL when it is e- far from monotone, 
with probability at least 2/3. The algorithm uses 










• o(W + ^ + I°S>) = samples, when O = COND d ; 

• d(^+£ max ^ + = o(^) samples, when O = INTCOND d . 

This in turn implies Theorem 4.2. Note that we make sure in Step 9 that each of the intervals 
we recurse on contains at least one of the “reference samples” hp this is in order to guarantee all 
conditional queries made on a set with non-zero probability. Discarding the “light intervals” can be 
done without compromising the correctness, as with high probability each of them has probability 
weight at most -rA—, and therefore in total the light intervals can amount to at most e/4 of the 

^max 

probability weight of D - as in the original argument of Batu et ah, we can still conclude that with 
high probability D is e/2-close to D. 

4.2 A poly(l/e)-query tester for COND 

The idea in proving Theorem 4.1 is to reduce the task of testing monotonicity to another property, 
but on a (related) distribution over a much smaller domain. We begin by introducing a few 
notations, and defining the said property: 

4.2.1 Reduction from testing properties over [£] 

For fixed a and D, let D(f d be the reduced distribution on [£] with respect to the oblivious decom¬ 
position X a , where all throughout £ = £{a,n) as per Definition 2.1; i.e, 

Vfc € [4 D r a ed (k ) = D(I k ) = $ a (D)(/ fc ) 

Note that given oracle access SAMPo, it is easy to simulate SAMP^red. 

Definition 4.5 (Exponential Property). Fix n, a, and the corresponding £ = £(n,a). For distri¬ 
butions over [4 let the property V a be defined as “Q £ V a if and only if there exists D £ M over 
[n] such that Q = D™ d ." 

Fact 4.6. Given a distribution Q over [£}, let expand a ((3) denote the distribution over [n] obtained 
by “spreading” uniformly Q(k ) over I k (again, considering the oblivious decomposition of [n] for 
a). Then, 

Q £ V a expand Q ,(Q) £ M. (6) 

Fact 4.7. Given a distribution Q over [£\, the following also holds: 9 

V a (Q)^Vk<£, Q(k + 1) <(l + a)Q(k) (7) 

Remark 4.8. It follows from Fact 4.6 that, for D over [n], 

$ a (D) £ M D™ d £ V a (8) 

We shall also use the following result on flat distributions (adapted from [BKR04, Lemma 7]): 

9 We point out that the equivalence stated here once again ignores, for the sake of conceptual clarity, technical 
details arising from the discrete setting. Taking these into account would yield a slightly weaker characterization, 
with a twofold implication instead of an equivalence; which would still be good enough for our purpose. 
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Fact 4.9. & a (D) is e-close to monotone if and only if it is e-close to aX a -flat monotone distribution 
(that is, a monotone distribution piecewise constant, according to the same partition X a ). 

Proof. The sufficient condition is trivial; for the necessary one, assume <f> 7 (Z)) is e-close to mono¬ 
tone, and let Q be a monotone distribution proving it. We show that dTv(^ 7 (-D)j <3? 7 (Q)) < e: 


l 


2d tv(^ 7 ( j D),4> 7 (Q)) = £ \^(D)(I k ) 

k=\ 


t 


$ 7 (Q)(4)I = £ |$ 7 (£>)(4) - <5(4)1 

k =1 


= £ 


£($ 7 (D)(i)-Q(i)) 


4= 1 iG/fc 


<££|$ 7 (i?)(i)-Q« 


k= 1iS/fe 


= £ I$ 7 (D)(z) - Q(i) I = 2d TV ($ 7 (£>), Q) 
2=1 
< 2e. 


□ 

Observe that Fact 4.9, Remark 4.8 and Fact 4.6 altogether imply that, for Z a -flat distributions, 
distance to monotonicity and distance to V a of the reduced distribution are equal. 


4.2.2 Efficient approximation of distance to <1>(.D) 


Lemma 4.10. Given COND access to a distribution D over [n\, there exists an algorithm that, on 
input a and e, 6 € (0,1], makes O ( p? log I J queries (independent of a) and outputs a value d such 


that, with probability at least 1 


<5, 


d-d TV (D,$ a (D)) 


< e. 


Proof. We describe such algorithm for a constant probability of success; boosting the success prob¬ 
ability to 1 — 5 at the price of a multiplicative log ^ factor can then be achieved by standard 
techniques (repetition, and taking the median value). Let D , e and X a be defined as before; define 
Z to be a random variable taking values in [0,1], such that, for k S [£\, Z is equal to dx \{Di k ,Ui k ) 
with probability Uk = D(I k ). It follows that 


EZ = £ u k dTv(Di k ,Ui k ) = - £ u k £ 


fc=i 

1 
2 


4=1 


DiAi)- 


141 


= ,EE DO)- 

4=1 2G//c 

= d tv(A$ q (D)). 


D(I k ) 


141 




2=1 


(9) 


Putting aside for now the fact that we only have (using as a subroutine the COND algorithm 
from Theorem 2.7 to estimate the distance to uniformity) access to additive approximations of the 
d^y(Dj k ,Uj k ) , s, one can simulate independent draws from Z by taking each time a fresh sample 
i ~ D, looking up the k for which i € I k , and calling the COND subroutine to get the corresponding 
value. Applying a Chernoff bound, only Ofl/e 2 ) such draws are needed, each of them costing 
0(l/e 2 °) COND queries. 
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Dealing with approximation. It suffices to estimate EZ within an additive e/2, which can be 
done with probability 9/10 by simulating m = 0(l/e 2 ) samples from Z. To get each sample, for 
the index k drawn we can call the COND subroutine with parameters e/2 and 5 = l/(10m) to 
obtain an estimate of dx y(Dj k ,Ui k ). By a union bound we get that, with probability at least 9/10, 
all estimates are within an additive e/2 of the true value, incurring only a 0(logl/e) additional 
factor in the overall sample complexity 0(l/e 20 ). Conditioned on this, we get that the approximate 
value we compute instead of E Z is off by at most e/2 + e/2 = e (where the first term corresponds 
to the approximation of the value of Z for each draw, and the second comes from the additive 
approximation of E Z by sampling). □ 

4.2.3 The algorithm 


Algorithm 2 Algorithm TestMonCond 
Require: COND access to D 

1: Simulating COND^red, check if & a (D) is (e/4)-close to monotone by testing (e/4)-farness (of 
D(f d ) to V a ', return FAIL if not. 

2: Test whether (D) is (e/4)-close to D using the sampling approach discussed above; return 
FAIL if not. 

3: return ACCEPT 


The tester is described in Algorithm 2. The second step, as argued in Lemma 4.10, uses 0(l/e 22 ) 
samples; we will show in Section 4.2.4 that efficiently testing e-farness to is also achievable with 
0(l/e 6 ) COND queries - concluding the proof of Theorem 4.1. 

Correctness of Algorithm 2. Assume we can efficiently perform the two steps, and condition 
on their execution being correct (as each of them is run with for instance parameter 6 = 1/10, this 
happens with probability at least 3/4). 

• If D is monotone non-increasing, so is 4> Q (-D); by Remark 4.8, this means that V a (D r c ffi) 
holds, and the first step passes. Theorem 2.2 then ensures that D and Q a (D) are a-close, and 
the algorithm outputs ACCEPT; 

• If D is e-far from monotone, then either (a) $ a (D) is |-far from monotone or (b) dTv(D, § a (D)) > 
|; if (b) holds, no matter how the algorithm behaves in first step, the algorithm not go fur¬ 
ther that the second step, and output FAIL. Assume now that (b) does not hold, i.e. only 
(a) is satisfied. By putting together Fact 4.9, Remark 4.8 and Fact 4.6, we conclude that (a) 
implies that D(f d is f-far from V a , and the algorithm outputs FAIL in the first step. 

4.2.4 Testing e-farness to T 1 

To achieve this objective, we begin with the following lemma, which relates the distance between a 
distribution Q and V a to the total weight of points that violate the property. 

Lemma 4.11. Let Q be a probability distribution over \l\, and W = { i : Q{i ) > (1 + a)Q{i — 1) } 
be the set of witnesses (points which violate the property). Then, the distance from Q to the property 
V a is 0(l/a)Q(W). 
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Proof. One can define the procedure Fixup q which, given a distribution Q and the corresponding 
W, acts as follows: 

Ensure: Fixup q (Q) is a distribution satisfying V a 
Q' <- Q,W' 
while W' ^ 0 do 

Let i > 1 be the smallest (leftmost) point in W ', set A 4— 0 and d 4— 1. 

while Q(i — d) > do > Increase the weight of the predecessors of i 

A «- A + (Q'(i)( 1 + a)~ d - Q'(i - d)) 

Q'(i-d) ^Q'(i)(l + a)~ d 

end while 

while A > 0 do > Remove this weight from the rightmost points 

Q'(k) <-Q'(k) - min(A,Q'(k)) 

A 4— A — min(A, Q'(k )) 

end while 
end while 
return Q' 

If A denotes the total probability weight reassigned (i.e, the sum of the Aj’s, where Aj is the total 
weight reassigned for witness i), then we have that 2dxv(Q, Fixup q (<2)) = Ya=i |FlxuP a (Q)(*) ~ Qif) I A 
J2iew 2Aj = 2A; and since 


a < £ QW • 

i£W 


( 1 1 
\ 1 + a (1 + a) 2 


+ ■■■) <Q{W)- 


1 Qi 
a 


we get that dTv(Q, Fixup a (<5)) = o[^jQ{W) (and Fixup q (( 5) clearly satisfies V a ). □ 

Remark 4.12. Lemma 4.11 implies that when Q is e-far from having the property, it suffices to 
sample points according to Q and compare them to their neighbors to detect a violation 

with high probability. Note that this last test would be easy, granted access to an exact EVAL 
oracle; for the purpose of this section, however, we can only use an approximate one. The lemma 
below addresses this issue, by ensuring that there will be many points “patently” violating the 
property. 

Lemma 4.13. Let Q be as above, and, for r > 0, let W T = { i : Q(i ) > (1 + a + r)Q(i — 1) } be 
the set of t- witnesses (so that W = \J t> qW t ). Then, the distance from Q to the property V a is at 
most 0{l/(a + t))Q{W t ) + Oir/a 2 ). 

Corollary 4.14. Taking a = 0(e) and t = ea 2 , we get that if Q(W T ) < s 2 , then Q is 0{e)-close 
to V a - 

Proof of Lemma f.13. We first apply the “fix-up” as defined in the proof of Lemma 4.11 to get Q' 
such that Q'{i) < (1 + a + t)Q'( i — 1) for all i, at a cost of 0^-^pp^Q(W T ). Next, we obtain a 
distribution Q" satisfying V a by apply the fix-up to all i such that Q'{i) > (1 + a)Q'(i — 1). If 
we start from some violating i (until we reach some ki = i — d such that Q'(k ) does not need to 
be fixed since Q"{k + 1) < (1 + a)Q'(k )), we know that before the fix-up, for each 1 < d < ki, 
Q'{i — d) > i an d now, after the fix-up Q" (i) = Q'(i ) and Q"(i — d) = • The cost of 
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this increase is: 


Q"(i-d)-Q'{i-d)<Q/i)- (- 


+ a) d (1 + a + t) c 


( 10 ) 


Using the fact that (1 + a + r) = (l + a)(l + r/(l + a)) < (l + a)(l + r) so that 1+c l L+r > ( 1+ct ) 1 (i +r ) , 
we get 


Q"{i-d)-Q'{i-d) < Q\i) 
= Q\i) 
< Q'(i) 


1 


• 1 - 


1 


(1 + a) d \ (1 + r) d 

1 (1 + r) d — 1 

(1 + a) d ’ (1 + r) d 

rd 

(1 + a) d 


(where the last inequality uses the fact that (1 + r) d — 1 = dr + Qr 2 + ■ • • + dr d 1 + r d which is 
less than dr ■ (1 + r) d = dr + d 2 r 2 + d(^)r 3 + • • • + d 2 r d + dr d+l ). Since, for x € [0.1) 

n oo ^ 

E %■ x % < e j- x i = 

we get that 


Z— 1 


Z=1 


(1 -*) 


2 ’ 


( 11 ) 


E - <0) < ■ r f d • j < Q'(i ) • ^4^ 

i<lE ( 1 + 0; ) « 2 


( 12 ) 


By summing over all i from which we start the (second) “fix-up,” we get an increase of at most 
r| 'E > ' 1 ■ By the triangle inequality, the total distance from Q to Q" is therefore at most 


i±£±I W) +hi±L> 


ct + r 


ar 


(13) 

□ 


By leveraging Corollary 4.14, we are able to obtain efficient approximation of the distance of a 
distribution to the “exponential property”: 

Theorem 4.15. There exists a constant 0 < c < 1 such that, for any e > 0: if Q satisfies V a (where 
a = ce),then with probability at least 2/3 Algorithm TestingExponentialProperty returns AC¬ 
CEPT, and if Q is f 1(e)-far from V a , then with probability at least 2/3 Algorithm TestingExpo¬ 
nentialProperty returns FAIL. The number of PAIRCOND queries performed by the algorithm 

is ^(f)- 

Proof (sketch). The algorithm can be found in Algorithm 3. We here prove its correctness, before 
turning to its sample complexity. 
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Correctness. Conditioning on the events of all calls to Compare returning a correct value (by 
a union bound, this happens with probability at least 9/10), we have that: 

• if Q satisfies V a . then for any sample Sj > 1, Compare can only return Low or a value p. 

In the latter case, since Si W = 0, it holds that Q(si — 1) < (1 + a)Q(si), and therefore 
p > (1 — rf) Q(g S _i) > (where the last inequality holds because of the choice of 

77 ), and the algorithm does not reject; 

• if however Q is Q(c)-far from V a , Corollary 4.14 ensures that with probability at least 9/10 
one of the samples will belong to W T . For such a Sj, Compare will either return High 
(and the algorithm will reject) or a value p. In the latter case, it will be the case that 
Q(si — 1) > (1 + a + r)Q(si), and thus p < (1 + rj) > 1 ^/_ T , and the algorithm will 
reject. 

The outcome of the algorithm will hence be correct with probability at least 3/4. 

Sample complexity. By choice of a, m = 0(l/e 2 ) and r = 0(l/e 3 ); each of the m calls to 
Compare costs = o(^) = o(^). □ 


Algorithm 3 TestingExponentialProperty 


Require: PAIRCOND access to Q, a € [0,1) 

Ensure: with probability at least 3/4 returns FAIL if Q is 

satisfies V a . 

o a def 2 
Set r = ea 

Draw m = f 0samples s\,... ,s m from Q 0 

for i = 1 to m do 
if Sj > 2 then 

Call Compare (from Lemma 2.6) on {sj — 1}, {s*} 
if the procedure outputs High then return FAIL 
else if it outputs a value p then > 

if p < then return FAIL 

end if 
end if 
end if 
end for 

return ACCEPT 


> Useful for a = 0(e) < 1 
0(e)-close to V a , and ACCEPT if it 

Contains an element from W T w.h.p. 


with rj = K = 2 and 5 = 

V 2 • Q&) < Q( s * -!) < ^ • Q(*i) 


5 With EVAL access 

In this section, we describe a poly (log n, 1 /e)-query tester for monotonicity in the Evaluation Query 
model (EVAL), in which the testing algorithm is granted query access to the probability mass 
function unknown distribution - but not the ability to sample from it. 

Remark 5.1 (On the relation to ^-testing for functions on the line). We observe that the results 
of Berman et al. [BRY14] in testing monotonicity of functions with relation to £ p distances do not 
directly apply here. Indeed, while their work is indeed concerned with functions /: [n] —>• [0,1] 


14 












to which query access is granted, two main differences prevent us from using their techniques for 
EVAL access to distributions: first, the distance they consider is normalized, by a factor n in the 
case of t\ distance. A straightforward application of their result would therefore imply replacing 
e by e' = e/n in their statements, incurring a prohibitive sample complexity. Furthermore, even 
adapting their techniques and structural lemmata is not straightforward, as distance to monotone 
[0, l]-valued functions is not directly related to distance to monotone distributions: specifically, the 
main tool leveraged in their reduction to Boolean Hamming testing ([BRY14, Lemma 2.1]) does no 
longer hold for distributions. 


5.1 A poly(logn, l/e)-query tester for EVAL 

We start by stating two results we shall use as subroutines, before stating and proving our theorem. 

Lemma 5.2. Given EVAL access to a monotone distribution D over [n\, there exists a (non- 
adaptive) algorithm 10 that, on inputs, makes queries and outputs a monotone distribution 

D such that <1 tv(d,D) < e. Furthermore, D is an O-histogram. 


Proof. This follows from adapting the proof of Theorem 2.2 as follows: we consider the same 
oblivious partition of [n] in I = 0(logn/e) intervals, but instead of taking as in (2) the weight of a 
point i e Ik to be the average D{I}.)/ |/fc|, we consider the average of the endpoints of Ik = (a*,, a^+i]: 


VA;G[4Vi€/*, D e {i) 


2 


Clearly, this hypothesis can be (exactly) computed by making l EVAL queries. The result directly 
follows from observing that, in the proof of his theorem, Birge first upperbounds ||<1> £ (D) — D\\ j by 
|| D e — D|| l5 before showing the latter - which is the quantity we are interested in - is at most 2e (see 
[Bir87, Eq. (2.4)—(2.5)]). The last step to be taken care of is the fact that D e , as defined, might not 
be a distribution - i.e., it may not sum to one. But as D e is fully known, it is possible to efficiently 
(and without taking any additional sample) compute the ^-histogram monotone distribution D e 
which is closest to it. We are guaranteed that D £ will be at most 4e-far from D £ in I\ distance, as 
there exists one particular distribution, namely <L e (-D), that is (being at a distance at most 2e of 
D as well). Therefore, overall D e is a monotone distribution that is at most 6e-far from D in i\ 
distance, i.e. dTv(^D, D £ j < 3e. □ 


Theorem 5.3 (Tolerant identity testing ([CR14, Remark 3 and Corollary 1])). Given EVAL access 
to a distribution D over [n], there exists a (non-adaptive) algorithm that, on input e, d € (0,1] and 
the full specification of a distribution D*, makes Ofplog queries and outputs a value d such 


that, with probability at least 1 


< 5 , 


d-d TV {D,D*) 


< e. 


Theorem 5.4. There exists an O^max^^p, pj J -query tester for monotonicity in the EVAL 
model. 

10 Recall that a non-adaptive tester is an algorithm whose queries do not depend on the answers obtained from 
previous ones, but only on its internal randomness. Equivalently, it is a tester that can commit “upfront” to all the 
queries it will make to the oracle. 
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Proof. Calling Lemma 5.2 with accuracy parameter e/4 enables us to learn a histogram D guar¬ 
anteed, if D is monotone, to be (e/4)-close to D; this by making i c = queries. Using 

Theorem 5.3, we can also get with 0(l/e 2 ) queries an estimate d of dxv ( D , D^j , accurate up to an 
additive e/4 with probability at least 2/3. Combining the two, we get, with probability at least 
2/3, 

(i) D , (e/4)-close to D if D is monotone; 

(ii) d e [d TV (a £>) - e/4, d TV + e/4]. 

We can now describe the testing algorithm: 

Algorithm 4 Algorithm TestMonEval 
Require: EVAL access to D 
1: Set a = e/4, and compute X a . 

2: Get a candidate approximation of D and test it for monotonicity, by: 

(a) Applying Lemma 5.2 with parameter a to obtain D, histogram on X a \ 

(b) Checking (offline) whether D is (e/4)-close to monotone; return FAIL if not. 

3: Get an estimate d of dxv^L), D^j up to additive e/4, as per (ii); return FAIL if d > e/2. 

4: return ACCEPT 


To argue correctness, it suffices to observe that, conditioning on the estimate being as accurate 
as required (which happens with probability at least 2/3): 

• if D G A4, then D £ Ai as well and we pass the first step. We also know by Lemma 5.2 that 
in this case dTv(-D, D'j < a, so that our estimate satisfies d < a + e/4 = e/2. Therefore, the 
algorithm does not reject here either, and eventually outputs ACCEPT. 

• conversely, if the algorithm outputs ACCEPT, then we have both that (a) the distance of D 
to M. is at most e/4, and (b) d-rv^i-^) < d + e/4 < 3e/4; so overall dTv(L), M.) < e. 

As for the query complexity, it is straightforward from the setting of a = 0(e) and the foregoing 
discussion (recall that Step (b) can be performed efficiently, e.g. via linear programming ([BKR04, 
Lemma 8])). □ 

5.2 An fi(logn) (non-adaptive) lower bound for EVAL 

In this section, we show that, when focusing on non-adaptive testers, Theorem 5.4 is tight (note 
that the tester described in the previous section is, indeed, non-adaptive). 

Theorem 5.5. For any e € (0,1/2), any non-adaptive e-tester for monotonicity in the EVAL model 
must perform Il2£L queries. 

Proof. We hereafter assume without loss of generality that n /3 is a power of two; and shall define 
a distribution over pairs of distributions, T>, such that the following holds. A random pair of 
distributions (D\. Do) drawn from T> will have D\ monotone, but D2 e-far from monotone. Yet, no 
non-adaptive deterministic testing algorithm can distinguish with probability 2/3 (over the draw of 
the distributions) between D\ and D2, unless it performs clogn EVAL queries. By Yao’s minimax 
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principle, this will guarantee that any non-adaptive randomized tester must perform at least clogn 
queries in the worst case. 

More specifically, a pair of distributions {D\,D 2 ) is generated as follows. A parameter m is 
chosen uniformly at random in the set 


M d = 


f 2 2 . . 2 n't 

< , (1 + re),..., (1 + ne) > , 

[K£ K£ K£ 3 J 


where k = f D\ is then set to be the uniform distribution on {1,..., (2 + Ke)rn}; as for D- 2 , it 

is defined as the histogram putting weight: 

• \ — £ on {1,..., m}; 

• 0 on I m = f {m + 1,..., [(1 + *§)m \}; 

• 2e on J m = f {[(1 + ^ )m\ + 1,..., [(1 + ne)m \}; 

• and ^ — e on {[(1 + K£)m\ + 1,..., [(2 + ne)m\}. 

It is not hard to see that D\ is indeed monotone, and that the distance of D 2 from monotone is 
exactly £. 


Dj(i) 


m (1 + Ke)m (2 + K,e)m 


i 


Figure 1: Construction of D\ (dotted) and D-2- 

The key of the argument is to observe that if too few queries are made, then with high probability 
over the choice of m no queries will hit the interval I m U J m ; and that conditioning on this, what 
the tester sees in the yes- and no-cases is indistinguishable. 

Claim 5.6. LetT be a deterministic, non-adaptive algorithm making q < queries to the EVAL 
oracle. Then, the probability (over the choice of m) that a query hits I m U J m is less than 1/3. 

Proof. This follows from observing that the probability that any fixed point x € [n] belongs to 
I m U J m is at most pyy = (1 + °(l))i 5 § 7 j) as this can on ly happen for at most one value of m (among 
\M\ equiprobable choices). By Markov’s inequality, this implies that the probability of any query 
falling into I m is at most | (for n big enough). □ 
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To see why the claims directly yields the theorem, observe that the above implies that for any 


such algorithm, 


Pr Dl 


■j-Di 


yes 


-P 


q-D2 — 


yes 


< 3 . But then T cannot be a successful 


nronotonicity tester, as otherwise it would accept D\ with probability at least 2/3, and D 2 with 
probability at most 1/3. □ 


5.3 An ©(logn) (adaptive) lower bound for EVAL 

While the above lower bound is tight, it only applies to non-adaptive testers; and it is natural to 
ask whether allowing adaptivity enables one to bypass this impossibility result - and, maybe, to 
get constant-query testers. The following result shows that it is not the case: even for constant e, 
adaptive testers must also make (almost) logarithmically many queries in the worst case. 

Theorem 5.7. There exist absolute constants £0 > 0 and c > 0 such that the following holds. Any 
£Q-tester for monotonicity in the EVAL model must perform - queries. (Furthermore, one 

can take £0 = 1/2./ 

Intuitively, if one attempts to design hard instances for this problem against adaptive algorithms, 
one has to modify a yes-instance to get a no-instance by “removing” some probability weight and 
“hiding” it somewhere else (where it will then violates monotonicity). The difficult part in doing so 
does not lie in hiding that extra weight: one can always choose a random element k E {n/2, ..,n} 
and add some probability to it. Arguing that any EVAL algorithm cannot find k unless it makes 
£l(n) queries is then not difficult, as it is essentially tantamount to finding a needle in a haystack. 

Thus, the key is to take some probability weight from a subset of points of the support, in 
order to redistribute it. Note that this cannot this time be a local modification, as in a monotone 
distribution one cannot obtain f2(l) weight from a constant number of points unless these are 
amongst the very first elements of the domain; and such case is easy to detect with 0(1) queries. 
Equivalently, we want to describe how to obtain two non-negative monotone sequences that are 
hard to distinguish, one summing to one (i.e., already being a probability distribution) and the 
other having sum bounded away from one (the slack giving us some “weight to redistribute”). To 
achieve this, we will rely on the following result due to Sariel Har-Peled [Hai'15], whose proof is 
reproduced below: 11 

Proposition 5.8. Given query access to a sequence of non-negative numbers a n > ■ ■ ■ > a± and 
£ E (0,1), along with the promise that either Y)k=i a k = 1 or Y)k=i a k < 1 — £, any (possibly 
adaptive) randomized algorithm that distinguishes between the two cases with probability at least 
2/3 must make ^( qogiog^ ) queries in the worst case. (Moreover, the result even holds for £ = 1/2/. 

Proof. Let (afc)fce[n] be a sequence defined as follows: we partition the sequence into L blocks. In 
the z-th block there are going to be n* elements (i.e., J2i n i = n )• Set the z-th block size to be 
rii = L 1 , where L = f ©(log n/ log log n) is the number of blocks. Let /3 = f (2 L — l)/(2 L) be a 
normalizing factor; an element in the z-th block has value ctj = 2j / , so that the total sum of the 
values in the sequence is (3 /2 < 1/2. 

From (afc)fc e [ n ], we obtain another sequence (bk)ke.[n\ by picking uniformly at random an arbi¬ 
trary block, say the j- th one, and set all values in its block to be oy_i = Latj (instead of ay). This 

11 For more results on approximating the discrete integral of sequences, including an upper bound for monotone 
sequences reminiscent of Birge’s oblivious decomposition, one may consult [Har06]. 
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increases the contribution of the j- th block from /3/2L to f3/2, and increase the total sum of the 
sequence to f3( 1 — ^ ) = 1. Furthermore, it is straightforward to see that both sequences are indeed 
non-decreasing. 

Informally, the idea is that to distinguish (a*,) from ( bk ), any randomized algorithm must check 
the value in each one of the blocks. As such, it must read at least Q(L) values of the sequence. To 
make the above argument more formal, with probability p = 1/2, give the original sequence of sum 
1 as the input (we refer to this as original input). Otherwise, randomly select the block that has 
the increased values (modified input). Clearly, if the randomized algorithm reads less than, say, 
L/8 entries, it has probability (roughly) 1/8 to detect a modified input. As such, the probability 
this algorithm fails, if it reads less than L/8 entries, is at least (1 — p)( 7/8) > 7/16 > 1/3. □ 

Proof of Theorem 5.7. To get Theorem 5.7 from Proposition 5.8, we define a reduction in the ob¬ 
vious way: any EVAL monotonicity tester T can be used to solve the promise problem above by 
first choosing uniformly at random an element k in {2, ...,n}, and then answering any query 
j € [n] \ {k} from A by returning the value aj. (This indeed defines a probability distribution 
that is either monotone (if Y2k a k = 1) or far from it (if ak = 1/2): k is the index where the - 
possibly - extra weight 1/2 would have been “hidden,” in a no-instance; and is therefore the only 
query point we cannot answer.) Conditioning on k not being queried (which occurs with probability 
1 — 0(l/n) given the random choice of k, it is straightforward to see that outputting the value 
returned by T yields the correct answer with probability 2/3. From the above, any such T must 
therefore have query complexity ^( logiogn )• d 

Open question. It is worth noting that a different construction, also due to [Harl5], yields a 
different lower bound of f2(l/e) for the promise problem of Proposition 5.8. Combining the two (and 
applying the same reduction as above), we obtain a lower bound of fl(max(logn/loglogn, 1/e)) 
for testing monotonicity in the EVAL model. However, we do conjecture the right dependence on 
n to be logarithmic; more specifically, the author believe the above upper bound to be tight: 

Conjecture 5.9. Monotonicity testing in the EVAL model has query complexity 

6 With Cumulative Dual access 

Theorem 6.1. There exists an -query (independent of n) tester for monotonicity in the 

Cumulative Dual model. 

Proof. We first give the overall structure of the tester - without surprise, very similar to the ones 
in Section 4.2 and Section 5.1: 

Algorithm 5 Algorithm TestMonCumulative 

Require: CEVAL and SAMP access to D 

def 

1 : Set a = e/4, and compute T a . 

2 : Test if t&o/D) is (e/4)-close to monotone by testing (e/4)-closeness (of D(f d ) to V a \ return 
FAIL if the tester rejects. 

3: Get an estimate d of dTv(T), 4> a (Zl)) up to additive e/4; return FAIL if d > e/2. 

4: return ACCEPT 
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Before diving into the actual implementation of Steps 2 and 3, we first argue that, conditioned 
on their outcome being correct, TestMonCumulative outputs the correct answer. The argument 
is almost identical as in the proof of Theorem 5.4: 

• if D G A4, then & a (D) € Ai as well; by Remark 4.8 it follows that D™ d E V a and we pass 
the first step. We also know (Theorem 2.2) that dTv(4, Q a (D)) < a, so that our estimate 
satisfies d < a + e/4 = e/2. Therefore, the algorithm does not reject here either, and 
eventually outputs ACCEPT. 

• conversely, if the algorithm outputs ACCEPT, then we have both that (a) the distance of 
$> a (D) to A4 is at most e/4, and (b) dTv(4?, ^(-D)) < d+e /4 < 3e/4; so overall dTv(-^> M) < 
e. 

It remains to show how to perform steps 2 and 3 - namely, testing D™ d for V a given CEVAL and 
SAMP access to D, and approximating dTy(D,$ a (D)). 


Testing 7 -closeness to V a This part is performed similarly as in Section 4.2.4, observing that 
one can easily simulate access to PAIRCONDq from a CEVALq oracle. Indeed, Lemma 4.11 implies 
that when Q is e-far from having the property, it suffices to sample points according to 

Q = LA ed and compare them to their neighbors to detect a violation with probability at least 9/10. 
Note that this last test is easy, as we have query access to Q (recall that we have a CEVAL/) oracle, 
and that D™ d (k) = D(I k )). 


Efficient approximation of distance to <f>(.D) Let D, e and X a be as before; define Z to be 
a random variable taking values in [0,1], such that, for k G [f], Z is equal to dTy(Dj k ,Uj k ) with 
probability w k = D(I k ). It follows that 


EZ = ^2 w kdTv(Di k ,Ui k ) = - w k 


k =1 


k =1 


Dl k {i) - TWT 
Ufc 


D(i)~ 


= ^EE 

k = 1 i £ l k 

= d T y(D,$ a {D)). 


D(I k ) 


141 




i =1 


(14) 


Furthermore, one can simulate m = 0( 1/e 2 ) i.i.d. draws from Z by repeating independently the 
following for each of them: 

• draw i ~ D by calling SAMP/), and look up the k for which i € I k ', 

• get the value D(I k ) with 2 CEVAL queries (note that D{I k ) > 0, as we just got a sample from 

4); 

• estimate dxv {Dj k .U] k ) up to ±e (with failure probability at most yg^) by drawing 0( 1/e 2 ) 
uniform samples from I k and querying the values of D on them, to estimate 


dT y(Di k ,Ui k ) = 

jeh 


Di k {j) - 


1 {\ I k \ D Ik ( j )< l } - J 2 
i€lk 


D(j) 


D(I k ) 


141 — 1 


l {\ I k \ D Ik ( j )< l } 


= Ejr^h 


DU) 


D(I k ) 


141 - l 


• l 


{mil) IAI <:L } 
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Applying a Chernoff bound (and a union bound over all such simulated draws), performing this 
only m times is sufficient to get an e-additive estimate of E Z with probability at least 9/10, for an 
overall 0(l/e 4 ) number of queries (sample and evaluation). 

Proof of Theorem 6.1 Correctness has already been argued, provided both subroutines do not 
err; by a union bound, the “good event” that the two steps produce such an outcome happens with 
probability at least 4/5. The query complexity is the sum of = o(^) (Step 2) and 0(l/e 4 ) 

(Step 3), yielding overall the claimed 0( 1/e 4 ) sample complexity. □ 
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Dana Ron and Rocco Servedio, and owes its existence to them; similarly, the contents of Section 6 
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A Models of distribution testing: definitions 


In this appendix, we define precisely the notion of testing of distributions over a domain 0 (e.g., 
17 = [n]), and the different models of access covered in this work. 

Recall that a property V of distributions over 0 is a set consisting of all distributions that 
have the property. The distance from ZD to a property V, denoted dTv(D,V), is then defined as 
inf/x e -p dx y(D,V). We use the standard definition of testing algorithms for properties of distribu¬ 
tions over f l, where n is the relevant parameter for 17 (i.e., in our case, its size |17|). We chose here 
to phrase it in the most general setting possible with regard to how the unknown distribution is 
“queried”: and will specify this aspect further in the next paragraphs (sampling access, conditional 
access, etc.). 

Definition A.l. Let V be a property of distributions over 17. Let ORACLE/) be an oracle providing 
some type of access to ZD. A q-query ORACLE/) testing algorithm for V is a randomized algorithm 
T which takes as input n, e € (0,1], as well as access to ORACLE/). After making at most q(e,n ) 
calls to the oracle, T either outputs ACCEPT or REJECT, such that the following holds: 

• if ZD € V, T outputs ACCEPT with probability at least 2/3; 

• if dx y(D,V) >e,T outputs REJECT with probability at least 2/3. 

A.l The SAMP model 

In this first and most common setting, the testers access the unknown distribution by getting 
independent and identically distributed (i.i.d.) samples from it. 

Definition A. 2 (Standard access model (sampling)). Let ZD be a fixed distribution over 0. A 
sampling oracle for ZD is an oracle SAMP/) defined as follows: when queried, SAMP/) returns an 
element x G 17, where the probability that x is returned is D(x) independently of all previous calls 
to the oracle. 

This definition immediately implies that all algorithms in this model are by essence non-adaptive: 
indeed, any tester or tolerant tester can be converted into a non-adaptive one, without affecting 
the sample complexity. (This is a direct consequence of the fact that all an adaptive algorithm can 
do when interacting with a SAMP oracle is deciding to stop asking for samples, based on the ones 
it already got, or continue.) 

A.2 The COND model 

Definition A.3 (Conditional access model [CFGM13, CRS12]). Fix a distribution ZD over 17. A 
COND oracle for ZD, denoted COND/), is defined as follows: the oracle takes as input a query set 
S C 17, chosen by the algorithm, that has ZD(S’) > 0. The oracle returns an element i € S, where 
the probability that element i is returned is Ds(i ) = D(i)/D(S), independently of all previous calls 
to the oracle. 

Note that as described above the behavior of C0ND/)(5) is undefined if ZD(5) = 0, i.e., the 
set S has zero probability under ZD. Various definitional choices could be made to deal with this: 
e.g., Canonne et al. [CRS12] assume that in such a case the oracle (and hence the algorithm) 
outputs “failure” and terminates, while Chakraborty et al. [CFGM13] define the oracle to return 
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in this case a sample uniformly distributed in S. In most situations, this distinction does not make 
any difference, as most algorithms can always include in their next queries a sample previously 
obtained; however, the former choice does rule out the possibility of non-adaptive testers taking 
advantage of the additional power COND provides over SAMP; while such testers are part of the 
focus of [CFGM13]. 

Testing algorithms can often only be assumed to have the ability to query sets S that have 
some sort of “structure,” or are in some way “simple.” To capture this, one can define specific 
restrictions of the general COND model, which do not allow arbitrary sets to be queried but instead 
enforce some constraints on the queries: [CRS12] introduces and studies two such restrictions, 

“PAIRCOND” and “INTCOND .” 

Definition A. 4 (Restricted conditional oracles). A PAIRCOND (“pair-cond”) oracle for D is a 
restricted version of COND// that only accepts input sets S which are either 5 = 0 (thus providing 
the power of a SAMP /5 oracle) or 5 = {x, y} for some x,y G 0, i.e. sets of size two. 

In the specific case of O = [n], an INTCOND (“interval-cond”) oracle for D is a restricted version 
of COND// that only accepts input sets 5 which are intervals 5 = [a, b] = {a, a + 1,. .., b} for some 
a < b E [n] (note that taking a = 1, b = n this provides the power of a SAMP// oracle). 

A.3 The EVAL, Dual and Cumulative Dual models 

Definition A. 5 (Evaluation model [RS09]). Let D be a fixed distribution over O. An evaluation 
oracle for D is an oracle EVAL// defined as follows: the oracle takes as input a query element x € O, 
and returns the probability weight D(x ) that the distribution puts on x. 

Definition A .6 (Dual access model [BDKR05, GMV06, CR14]). Let D be a fixed distribution 
over ft. A dual oracle for D is a pair of oracles (SAMP//, EVAL//) defined as follows: when queried, 
the sampling oracle SAMP// returns an element x € 0, where the probability that x is returned is 
D{x) independently of all previous calls to any oracle; while the evaluation oracle EVAL// takes as 
input a query element y E O, and returns the probability weight D(y ) that the distribution puts 
on y. 

Definition A.7 (Cumulative Dual access model [BKR04, CR14]). Let D be a fixed distribution 
over Q = [?/]. A cumulative dual oracle for D is a pair of oracles (SAMP//, CEVAL//) defined as 
follows: the sampling oracle SAMP// behaves as before, while the evaluation oracle CEVAL// takes 
as input a query element 7 € [nl, and returns the probability weight that the distribution puts on 
\j], that is Dm = ELD(i) . 

(Note that, as the latter requires some total ordering on the domain, it is only defined for 
distributions over [?/]; as was the INTCOND oracle from Definition A. 4.) 

B Tolerant testing with Dual and Cumulative Dual access 

In this appendix, we show how similar ideas can yield tolerant testers for monotonicity, as long 
as the access model admits both an agnostic learner for monotone distributions and an efficient 
distance approximator. As this is the case for both the Dual and Cumulative Dual oracles, this 
allows us to derive such tolerant testers with logarithmic query complexity. (Note that these results 
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imply that, in both models, tolerant testing monotonicity of an arbitrary distribution is no harder 
than learning an actually monotone distribution .) 12 

Theorem B.l. Fix any constant 7 > 0. In the dual access model, there exists an (ei, £ 2 )-tolerant 
tester for monotonicity with query complexity provided £2 > (3 + 7 ) 61 . 

Proof. We will use here the result of Theorem 5.3 (with its probability of success slightly increased 
to 5/6 by standard techniques), as well as the Birge decomposition of [n] (with parameter f 2 (£ 2 )) 
from Definition 2.1. The tester is described in Algorithm 6 , and follows a very simple idea: lever¬ 
aging the robustness of the Birge flattening, it first agnostically learns an approximation D of the 
distribution and computes (offline) its distance to monotonicity. Then, using the (efficient) tolerant 
identity tester available in both dual and cumulative dual models, it estimates the distance between 
D and D: if both distances are small, the triangle inequality allows us to conclude D must be close 
to monotone. 

Algorithm 6 Algorithm TolerantTestMonotonicity 

Require: SAMP/) and EVALd oracle access, parameters 0 < £\ < £2 such that £2 > (3 + 7)61 
1: Set a c = 7i = f 2 ei + 2 a £2 and 72 c = (1 — a)£2 — £\- > So that 72 — 71 = ^(£2) 

2: Learn <h Q , £ 2 (D) to distance ae 2 , getting a (piecewise constant) D. > samples 

3: Test if D is {e\ + «£ 2 )-close to monotone; return FAIL if not. > no sample needed (LP) 
4: Test if D is 71 -close to D vs. 72 -far; return FAIL if far. > Tester from Theorem 5.3. 

5: return ACCEPT 


Correctness. Suppose the execution of each step is correct; by standard Chernoff bounds, this 
event for Step 2 (that is, our estimate D of <h £ 1 (D) being indeed £i-accurate) happens with proba¬ 
bility at least 5/6; furthermore, Step 3 is deterministic, and Step 4 only fails with probability 1/6 — 
so the overall “good event” happens with probability at least 2/3. We hereafter condition on this. 

5 

• if dTv(D,A4) < £ 1 , then according to Corollary 2.4: 

(a) dTv( < I > a£ 2 (D), A4) < £ 1 , so that di-v^A < £1 + a £2 and we pass Step 3; and 

(b) dxv(-A <& aS2 (D)) < 2ei + o£ 2 , which in turn implies that dTV^D, D'j < 2£\ -\-2a£2 = 71 ; 
and we pass Step 4. 

Therefore, the tester eventually outputs ACCEPT. 

• Conversely, if the tester accepts, it means that 

(a) <1tv(^D, Mj < £1 + cn £2 and 

(b) d T v (A D) < 72 = (1 - a )£ 2 - £1 

hence, by the triangle inequality dTv(A M) < £ 2 - 

12 It is however worth noting that, because of the restriction £2 > 3ei it requires, our tolerant testing result does 
not imply a distance estimation algorithm. 
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Query complexity. The algorithm makes SAM P queries in Step 2, and m = O ( ( 72 - 7l ) 2 ) 

O^-^J EVAL queries in Step 4. □ 

Remark B.2. With CEVAL access (instead of EVAL), the cost of Step 2 would be reduced from 
0(logn/£ 2 ) sample queries to get an (a £2 ^approximation to Oilogn/ £ 2 ) cdf queries to get the 
exact flattened distribution (as we have only that many quantities of the form D(Ik) to learn, and 
each of them requires only 2 cdf queries). This leads to the following corollary: 

Corollary B.3. Fix any constant 7 > 0. In the Cumulative Dual access model, there exists 
an (ei,£2) -tolerant tester for monotonicity with query complexity 0[ \ + , provided £2 > 

(3 + 7)£i. 
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